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Abstract: We consider symbolic tree automata (sta) and symbolic tree 
transducers (stt) . We characterize s-recognizable tree languages (which are 
the tree languages recognizable by sta) in terms of (classical) recognizable 
tree languages and relabelings. We prove that sta and the recently intro- 
duced variable tree automata are incomparable with respect to their recog- 
nition power. We define symbolic regular tree grammars and characterize 
s-rcgular tree languages in terms of regular tree languages and relabelings. 
As a consequence, we obtain that s-recognizable tree languages are the 
same as s-regular tree languages. 

We show that the syntactic composition of two stt computes the compo- 
sition of the tree transformations computed by each stt, provided that (1) 
the first one is deterministic or the second one is linear and (2) the first 
one is total or the second is nondeleting. We consider forward application 
and backward application of stt and prove that the backward application 
of an stt to any s-recognizable tree language yields an s-recognizable tree 
language. We give a linear stt of which the range is not an s-recognizable 
tree language. We show that the forward application of simple and lin- 
ear stt preserves s-recognizability. As a corollary, we obtain that the type 
checking problem of simple and linear stt and the inverse type checking 
problem of arbitrary stt is decidable. 
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1 Introduction 

Symbolic tree automata (sta) and symbolic tree transducers (stt) were introduced in 
[VBlla] and [VBllb]. They differ from classical finite-state tree automata and tree 
transducers [GS84, GS97] in that they work with trees over an infinite, unranked set 
of symbols. According to [GKS10], examples of systems with finite control and infinite 
source of data are software with integer parameters [BHM03], datalog systems with 
infinite data domain [BHJS07], and XML documents of which the leaves are associated 
with data values from some infinite domain [BCC + 03]. It was mentioned in [VBlla] 
that lifting the finite alphabet restriction is useful to enable efficient symbolic analysis. 
Symbolic transducers are useful for exploring symbolic solvers when performing basic 
automata-theoretic transformations [VHL+12]. 

In this paper we provide new formal definitions of sta and stt which slightly differs 
from those given in [VBlla, VBllb]. At the end of Sections 3.1 and 5.1 we will 
compare our definitions with the original ones. 

Roughly speaking, an sta is a finite-state tree automaton [Don70] except that the 
input trees are built up over an infinite set of labels. In order to ensure a finite 
description of the potentially infinite set of transitions we bind the maximal number 
of the successors of a any node occurring in an input tree by an integer k E N, and 
we employ finitely many unary Boolean- valued predicates over the set of labels. Then 
every transition of a symbolic k-bounded tree automaton (sfc-ta) has the form 

(q!...qi,(fi,q) 

where < I < k, q,qi, . . ■ ,qi are states, and ip is a unary Boolean- valued predicate. 
Such a transition is applicable to a node if ip holds for the label of that node. The 
tree language L(A) recognized by an sta A is defined as the union of all tree languages 
L(A, q), where q is a final state, and the family (L(A, q) | q G Q) is defined inductively 
in the same way as for finite-state tree automata. A tree language is sfc-recognizable 
if there is an sfc-ta which recognizes this language, and it is s-recognizable if it is sk- 
recognizable for some k £ N. An example of an s2-recognizable tree language is the 
set of all binary trees with labels taken from N such that every label is divisible by 2 
or every label is divisible by 3 as, e.g., 2(4,6) or 3(15, 18) (cf. Example 3.2). 

By restricting the set of labels to a ranked alphabet E and just allowing, for every 
a € E, the characteristic mapping on {a} as predicate, we reobtain the classical finite- 
state tree automata. In [VBlla] it was proved that bottom-up sta are determinizable, 
that the class of s-recognizable tree languages is closed under the Boolean operations, 
and that the emptiness problem for s-recognizable tree languages is decidable provided 
the emptiness problem in the Boolean algebra of predicates is decidable. 

Similarly, an stt is a top-down tree transducer [Tha70, Rou70, Eng75] except that 
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its input and output trees are built up over potentially infinite sets of (resp., input 
and output) labels. In the same way as for sta, we ensure finiteness by an a priori 
bound k on the maximal number of the successors of a node and by using a finite set 
of unary predicates. The right-hand side of each rule of a symbolic fc-bounded tree 
transducer (sfc-tt) contains unary functions, rather than explicit output symbols as in 
top-down tree transducers. These functions are then applied to the current input label 
and thereby produce the output labels. More formally, a rule has the form 

q{tp(x 1} .. .,xi)) -> u 

where < I < k, q is a state, tp is a unary Boolean-valued predicate over the set of 
input labels, x\, . . . ,xi are the usual variables that represent input subtrees, and u is 
a tree in which each internal node has at most k successors and is labeled by a unary 
function symbol; the leaves of u can be labeled alternatively by objects q'(xi) with 
state q' and Xi € {x\, . . . , x;}. Clearly, the leaf labels of the form q'{xi) organize the 
recursive descent on the input tree as usual in a top-down tree transducer. The tree 
transformation computed by an stt is defined in the obvious way by means of a binary 
derivation relation. For instance, there is a (nondctcrministic) s2-tt which transforms 
each binary tree over N into a set of binary trees over N such that a subtree n(£i, £2) 
of the input tree is transformed into m(£i,£ 2 ) where 

• m = n, and and £ 2 are transformations of £1 and £2, respectively, or 

• m — § if n is divisible by 6, and both !;[ and £ 2 are transformations of £1 (cf. 
Example 5.3). 

By restricting the predicates on the input labels to some ranked alphabet (as for sta 
above) and by only allowing unary functions such that each one produces a constant 
symbols from some ranked (output) alphabet, we reobtain top-down tree transducers. 

Since sta and stt can check and manipulate data from an infinite set, they can be 
considered as tools for analyzing and transforming trees as they occur, e.g., in XML 
documents. Thus, the theoretical investigation of sta and stt is motivated by practical 
problems as e.g. type checking and inverse type checking. 

In this paper we further develop the theory of sta and stt. We prove a characteriza- 
tion of s-recognizable tree languages in terms of (classical) recognizable tree languages 
and relabelings (Thm. 3.5). We compare the recognition power of sta with that of 
variable tree automata [MRU] (also cf. [GKS10]). More specifically, we characterize 
the tree language recognized by a variable tree automaton by the union of infinitely 
many s-recognizable tree languages (Prop. 3.8) and we show that sta and variable tree 
automata are incomparable with respect to recognition power (cf. Thm. 3.9). More- 
over, as a generalization of (classical) regular tree grammars [Bra69] we introduce 
symbolic regular tree grammars and characterize s-regular tree languages in terms of 
regular tree languages and relabelings (Thm. 4.3). As a corollary, we obtain that 
s-recognizable tree languages are the same as s-regular tree languages (Thm. 4.4). 

For stt we recall the concept of the syntactic composition from [VBllb]. We show 
that syntactic composition of two stt M and N computes the composition of the tree 
transformations computed by M and A/", provided that (1) A4 is deterministic or N is 
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linear or (2) A4 is total or Af is nondclcting (Thm. 5.8). Hereby, we generalize Baker's 
classical result [Bak79, Thm. 1]. 

Finally, we consider forward application and backward application of stt; these in- 
vestigations are motivated by the (inverse) type checking problem (see among others 
[MSV03, AMN+03, EM03, MBPS05]). We show that the backward application of 
an sfc-tt (which is the application of its inverse) to any sfc-recognizable tree language 
yields an sk- recognizable tree language (Thm. 6.3). It is well-known that the forward 
application of linear top-down tree transducers preserves recognizability of tree lan- 
guages (see e.g. [Tha69] or [GS84, Ch. IV, Cor. 6.6]). It is surprising that for stt the 
corresponding result does not hold, in fact there is a linear sfc-tt of which the range 
is not an sfc-recognizable tree language (Lm. 6.4). However, the application of simple 
and linear stt preserve s-recognizability (Thm. 6.5). As a corollary, we obtain that the 
type checking problem of simple and linear stt, as well as, the inverse type checking 
problem of arbitrary stt is decidable (Thm. 6.7). 

Since the theory of sta and stt is based on concepts which are slightly different from 
the foundations of classical finite-state tree automata and tree transducers, we list 
them in detail is Section 2. 

2 Preliminaries 

2.1 General 

The set of nonnegative integers is denoted by N. 

For a set A, we denote by \A\ and V(A) the cardinality and the set of all subsets of 
A. Moreover, we denote by la the identical mapping over A. For a set I, an I-indexed 
family over A is a mapping / : I — > A. We denote the family / also by (/, | i £ I). 

Let p C A x B be a relation. For every A 1 C A, we define p(A') = {b e B \ (a, b) e 
p for some a <G A'}. For another relation a C B x C , the composition of p and a is 
the relation po <j — {(a, c) | 3(b e B) : (a,b) G p and (6, c) e a}. The reflexive and 
transitive closure of a relation p C A x A is denoted by p* . 

2.2 Trees 

In this paper we mainly consider trees over a nonempty and unranked set. We note 
that our concept of a tree differs from that of [VBlla, VBllb] in that we do not 
consider the empty tree as the base of the inductive definition. 

Let U be a (possibly infinite) nonempty set, called the set of labels, and Y a further 
set. The set of trees over U (or: U -trees) indexed by Y, denoted by TjjiY), is the 
smallest subset T of (UU Y U {(, )} U {, })* such that (i) (UUY) C T, and (ii) if a e U 
and £i, G T with I > 1, then a(£i, ...,&) € T. If Y = 0, then we write T v for 

Ttj(Y). A tree language over U (or: U -tree language) is any subset of T\j. 
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Let Q be a set with Q n U = 0. Then we denote by Q{Tu{Y)) the subset {g(£) | q G 
Q^€T V (Y)} of T QUU (Y). 

We define the set of positions in a U -tree by means of the mapping pos : Tjj{Y) — > 
7>(N*) inductively on the argument £ e 7b(F) as follows: (i) if £ e (C/U Y), then 
pos(£) = {e}, and (ii) if £ = a(£i, • ••,£/) for some a € {/, Z > 1 and £1, . . . , & € 7b(Y), 
then pos(£) = {e} U {iv | 1 < i < I, v G pos(£i)}. 

For every £ G 7b(Y) and w G pos(£), the Za&eZ o/£ at w, denoted by £(w) G (UUY), 
the subtree of £ at w, denoted by £|„, € 7b (Y), and the rank at w, denoted by rk^(w) G 
N, are defined inductively as follows: (i) if £ € (U Li Y), then £(e) = £| e = £, and 
rk^(e) = 0, and (ii) if £ = a(£i, ...,£;) for some a £ U, I > 1 and £i, . . . ,£/ € 7b(Y), 
then £(e) = a, £| e = £, and rk^(e) = I, and if 1 < i < I and w = iv, then £(w) = 
£U = &|t>j and rk^(w) = rk^(v). 

Let £ e 7b (V) be a tree. For any V C [7, we define pos y (£) = {w G pos(£) | 
£(w) G V}. If V = {a}, then we write just pos a (£) for pos y (£). Moreover, for every 
C G Tu(Y) and w G pos(£), we denote by £[C]«> the tree which is obtained by replacing 
the subtree £\ w by (. 

We will consider trees with variables and the substitution of trees for variables. 
For this, let X = {x\,X2, ■ • •} be an infinite set of variables, disjoint with U, and let 
Xi = {xi, . . . , xi} for every I G N. For trees £ G T v {Xi) and &, • • • , € T V {Y), we 
denote by . . . , Q] the tree which we obtain by replacing every occurrence of Xi by 
Q for every 1 < i < I. We note that • • • , 0] € 7b(Y). Moreover, we denote by 
Cu(Xi) the set of trees in Tjj(Xi) in which each variable Xi occurs exactly once and 
the order of variables from left to right is X\,...,xi. We call the elements of Cu(Xi) 
l-contexts. 

Finally, let £ G T V (Y) and k G N. We define the rank rk(£) of £ to be rk(£) = 
max{rk^(w) | w G pos(£)} and we say that £ is k-bounded if rk(£) < k. We denote the 
set of all fc-bounded [/-trees indexed by Y by T^\y). Clearly, (Y) C T^ fe+1) (Y). 
A k-bounded U -tree language (or: (J7, fc)-tree language) is a subset of T[j k) . A [/-tree 
language L is bounded if there is a fc G N such that L is fc-bounded. Moreover, we 
define the set of k-bounded l-contexts to be C < ^\x{) — Cu(X{) n Tjy\Xi). 

In this paper U , V , and W will always denote arbitrary nonempty sets 
unless specified otherwise. 



2.3 Tree transformations 

Let k G N. A k-bounded tree transformation (or: k-tree transformation) is a mapping 
t : T^ k) -> -p(T^ fc) ) (or: alternatively, a relation r C xTy^). A tree transformation 

(k) 

is a fc-tree transformation for some fc G N. If for every £ G T^', there is exactly one 
C G Ty ^ such that (£, (i.e., r is a mapping), then we also write r : — > Ty 5 '. 

The inverse t _1 , the domain dom(r), and the range range(r) of a tree transformation 
t are defined in the standard way. 
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Let r C Tjj x Ty ' be a tree transformation, L C J and L' C ' tree languages. 
The forward application (or just: application) of t to L is the tree language t(L) = 
{( G Ty fe ' 1 | 3(£ e i) : (£,0 *= T }- The backward application of r to L' is the tree 
language t _1 (L') (which is the forward application of r _1 to L'). 

We extend the above concepts and the composition of tree transformations to classes 
of tree transformations and classes of tree languages in a natural way. For instance, if 
C and C are classes of fc-tree transformations, and £ is a class of fc-tree languages, then 
we define C o C = {r o a | r e C and a e C'} and C(£) = {r(L) | r e C and L e £}. 

A relabeling is a mapping r : [7 — > "P(^) such that r(a) is recursive and it is 
decidable if r(a) = for every a e U; it is called deterministic if r(a) is a singleton 
for every a £ U. Let fc e N. The k-tree relabeling (induced by t) is the mapping 
t' : T^ fc) -> ViTy^), defined by 

r'(a(a, • • ■ ,6)) = {KCi, - - - , CO I & e r(a) and & € r'(&) for 1 < i < I}. 

Then the mapping r' is extended to r" : P(T^. fe) ) -> P(T^ fe) ) by r"(i) = U ?e L T '(C) 

for every L e V{T^ k) ). 

We note that the composition of two fc-tree relabelings r{ and t' 2 is again a fc-tree 
relabeling. In fact, if n : f — > T'(V) and t 2 : V — > 'P(M / ), then n o r 2 induces t[ o t 2 . 
In the sequel, we drop the primes from t' and r" and identify both mappings with r. 

2.4 Predicates and label structures 

A (unary) predicate over U is a mapping ip : U — > {0, 1}. We denote by Pred(C7) 
the set of all predicates over U. Let <p e Prcd(C7) be a predicate. We introduce the 
notation [<pj for {a £ U \ ip(a) = 1}. 

We define the operations A, and V over Prcd(t7) in the obvious way and extend A 
and V to finite families (ipi \ i £ I) of predicates in Pred(l/). In particular, [Ai £ <Pi} = 
U and [V i£0 <Pi] = 0- 

Let $ C Pred(J7) be a finite set of recursive predicates such that [ipj — is decidable 
for every <p e $. We call the pair (U, $) a /a&e/ structure. The Boolean closure of $, 
denoted by BC(<I>), is the smallest set B C Pred(C7) such that 

(i) KB, 

(ii) -L,T E B where T(a) = 1 and -L(a) = for every a E U, and 

(iii) for every ip,ip £ B, the predicates ->ip, ip A ip, and ipV ip are in _B. 

It is clear that (BC($), A, V, -i, _L, T) is a Boolean algebra for every <I> C Pred(J7). 

2.5 Tree automata, tree grammars, tree transducers 

We assume that the reader is familiar with the basic concepts of the theory of (classical) 
tree automata and tree transducers which can be found among others in [GS84, GS97] 
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and [CDG + 97]. In particular, we freely use the concept of a ranked alphabet, a tree 
language over a ranked alphabet, a finite-state tree automaton, a recognizable tree 
language, a regular tree grammar, a regular tree language, a top-down tree transducer, 
and of a tree transformation. Here we recall only some notations. 

A ranked alphabet is a finite set E equipped with a rank mapping rk s : E — > N. We 
define E z = {a e E | rk s (er) = ^} (7 > 0) and maxrk(E) = max{rk s (<r) | a e E}. It is 
clear that every tree £ e Ts is maxrk(E)-bounded. 

A finite-state tree automaton is a system *4 = (Q, E, 6, F), where Q is a finite, 
nonempty set (states), S is a ranked alphabet, 6 = {5 a \ a £ E) is the family of sets of 
transitions, i.e., S a C Q l x Q for every Z € N and a € E with rks (a) = and F Q Q 
is the set of final states. The set of trees recognized by A is denoted by L(A). A tree 
language L C Ts is recognizable if there is a finite-state tree automaton .4 such that 
L = L(A). 

A regular tree grammar is a tuple C? = (Q, E, go, -R) where Q is a finite set of states 1 , 
E is a ranked alphabet, qo G Q (initial state), and R is a finite set of rules of the form 
q — > u with q £ Q and u e Ts(Q). The derivation relation induced by £/ and the 
tree language generated by £ are denoted by =>g and respectively. We will also 

consider reduced regular tree grammars and regular tree grammars in normal form in 
the sense of [CDG+97]. 

3 Symbolic tree automata 

In this section we formalize our adaptation of the concept of a symbolic tree automaton 
from [VBlla] and compare our model with the original one. Then we prove basic 
properties of sta. Finally, we compare the recognition capacity of sta with that of 
variable tree automata. 

3.1 Definition of sta 

Definition 3.1 Let k e N. A symbolic k-bounded tree automaton (sfc-ta) is a tuple 
A = (Q, U, $, F, R) where 

• Q is a finite, nonempty set (states), 

• (U, $) is a label structure, 

• F C Q (set of final states), and 

• R is a finite set of rules of the form (qi . . . qi,tp, q) where < I < k, qi, . . . , qi, q £ 
Q, and if e BC($). 

Let p = (qi ■ ■ ■ qi, <p, q) G R. We call (qi . . . q{) the left-hand side, ip the guard, and 
q the right-hand side of the rule p, and denote them by lhs(p), grd(p), and rhs(p) 

1 Usually these symbols are called nonterminals; but since this notion leads to misunderstandings in 
the application area of natural language processing, we prefer to call these symbols states. 
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respectively. Clearly, every sfc-ta is an s(fc + l)-ta. By a symbolic tree automaton (sta) 
we mean an sfc-ta for some k £ N. 

For every q £ Q, we define the tree language L(A, q) C recognized by A in 
state q, as follows. The family (L(A, q) \ q £ Q) is the smallest Q-family (L q \ q £ Q) 
of tree languages such that 

(i) if a £ U, (e, ip, q) £ R, and a £ [95], then a £ L g , and 

(ii) if a e U, (qi ■ ■ ■ qi,<p, q) £ R with 1 < I < k and a £ {ip}, and £1 £ L{A, qi), . . . , 
ii £ L(A,qi), then a(£i, ...,&)£ L q . 

The condition that all predicates in $ (and hence in BC(<£>)) are recursive ensure that 
we can decide whether £ e L(A, q) for every q £ Q and £ e T^ -*. 
The tree language recognized by A, denoted by L{A), is the set 

L(A)= |J L(A,q) . 

qeF 

A tree language L C T^ is symbolically k -recognizable (sfc-recognizable) if there is 
an sfc-ta A such that L(A) = L. We denote the class of all sfc-recognizable [/-tree 
languages by REC^(f7). Moreover, we call a tree language s-recognizable if it is 
sfc-recognizable for some fc £ N. 

Two sfc-ta .4 and 2? are equivalent if L(«4) = L(B). 

Example 3.2 We give an example of an sta. For this we consider the set U = N and 
the 2-bounded tree language 

L = {£ £ T^ 2) I £ is binary and 

((Vw G pos(£) : £(w) is divisible by 2) V (Vu> G pos(£) : £(w) is divisible by 3)} 

where a tree £ is binary if rkj(w) £ {0, 2} for every w £ pos(£). For instance, the trees 
2(4, 6) and 3(15, 18) are in L. 

The following s2-ta A — (Q, N, <J>, F, R) recognizes L: 

. g = F-{2,3}, 

• $ = {div(2), div(3)} with [div(i)] = {n £ N | n is divisible by i}, 

• for every i £ {2,3} the transitions (e,div(i),i) and (i i, div(i), i) are in _R. 
For instance, 6(12, 18) £ L(A, 2) n L(A, 3). 

Our definition of sta slightly differs from the one in [VBlla] in the following two 
points. 

1. They fix a Boolean algebra B of predicates in advance, and then they make a 
theory of sta only over B. We are free to choose predicates whenever we need them. 

2. In [VBlla] no bound on the number of successors of nodes is mentioned. In our 
definition we put an explicit bound on this number in order to guarantee closure of 
sfc-recognizable tree languages under complement. 
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We note that this closure under complement is not discussed clearly in [VBlla]. 
The root of the ambiguity is that the complement of a tree language, appearing in 
Prop. 3 of that paper, is not defined. If the complement of a tree language L is 
meant to be U 1 -^ \ L (as maybe is suggested by the definition of the complement of a 
predicate [VBlla, p. 146]), which corresponds to \ L in our notation, then the class 
of s-rccognizable tree languages is not closed under complement as stated in [VBlla, 
Prop. 3, Thm. 1]. This can be seen easily as follows. Let L be an s-recognizable 
tree language (in the sense of [VBlla] or of the present paper). Then obviously L is 
bounded, while the tree language \ L is not bounded. Hence the latter cannot 

be s-recognizable. 

However, if we define the complement of a fc-bounded tree language L with respect 
to T^j \ i.e., to be Ty^ \ L, then the class of sfc-recognizable tree languages is closed 
under complement (by using the appropriate adaptations of [VBlla, Prop. 3, Thm. 
!]• 

3.2 Basic properties 

Here we give a characterization of s-recognizable tree languages in terms of (classical) 
recognizable tree languages and tree relabelings. Moreover, we introduce uniform tree 
languages and show that any uniform tree language is not s-recognizable. 

We will need the following obvious fact. 
Observation 3.3 Both and the set are sfc-recognizable for every set U and 

fc e N. 

In the following we give a characterization of s-recognizable tree languages in terms 
of recognizable tree languages and relabelings. First we prove the next lemma. 

Lemma 3.4 

1. For every sfc-recognizable tree language L we can effectively construct a fc- 
bounded recognizable tree language V and a fc-tree relabeling r such that 
L = t(L>). 

2. For every fc-bounded recognizable tree language L' and fc-trce relabeling r we 
can effectively construct an sfc-recognizable tree language L such that L = t(L'). 

Proof First assume that L = L(A) for some sfc-ta A = (Q, U, $, F, R). We construct 
the finite-state tree automaton A' = (Q, S, 5, F), where 

• = {[ip, I] | (<7i . . . qi, if, q) G R for some qi, ■ ■ ■ ,qi,q & Q}, < I < k and 

• <V,j] = • • • 9i, q) I (?i • • • Quf, q) e R}. 

It should be clear that L(A') is fc-bounded. Moreover, we define the relabeling r : 
S -+ V{U) by T ([ip, I}) = M ^ every [ip, 1} e S. 
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We can easily prove the following statement by induction on trees: for every £ g Tj^ ' 
and q G Q we have 

£ g L(A,q) 3(C e L(-4',g)) such that £ g r(C), 

which proves that L(A) = t(L(A')). 

For the proof of the other implication, let us consider a finite-state tree automaton 
A' = (Q^t^, F) such that L(A') is fc-bounded. We may assume without loss of 
generality that maxrk(E) < fc. Moreover, let r : E — > V(U) be a relabeling. We 
construct the sfc-ta A = (Q, U, $, 5' ', F), where $ and i? are defined as follows: 

• $ = {(^cr | cr g E}, where [yv] = r(cr) for every cr g E, 

• <5' = {(<Zi • • • qi,<Pa,q) I (<7i ■ • ■ ft, 9) € <5 CT for some Z > 0,cr g E;}. 

It should be clear that L(A) =t(L(A')). ■ 

By letting r be the identity mapping in Lemma 3.4(2), we obtain that each recog- 
nizable tree language is also s-recognizable. A further consequence of Lemma 3.4 is 
the mentioned characterization. 

Theorem 3.5 A tree language L is sk -recognizable if and only if it is the image of a 
k-bounded recognizable tree language under a k-tree relabeling. 

Using the above characterization result, we can easily give examples of bounded 
tree languages that are not s-recognizable. For an infinite U, we call a tree language 
L QTjj uniform if it satisfies the following conditions: 

(a) L is infinite, 

(b) all trees in L have the same shape, i.e., for every (,( £ L, we have pos(£) = 
pos(C), and 

(c) for every £ g L, there is an a g U such that £(io) = a for every to g pos(£). 

For instance, the tree language L2 = {a(a) | a g U} is uniform provided U is infinite. 
In particular, pos(£) = {e, 1} for every £ g L^. Now we can prove the following. 

Lemma 3.6 Let L C be a uniform tree language such that |pos(£)| > 1 for every 
£ G L. Then L is not sfc-recognizable. 

Proof We prove by contradiction, i.e., we assume that L is sfc-recognizable. By 
Lemma 3.4(1), there is a ranked alphabet E, a fc-bounded recognizable tree language 
L' C Ts, and a relabeling r : E — > V(U) such that L = t(L'). Since r, being a fc-tree 
relabeling, preserves the shape of trees, the shape of all trees in L' is the same as that 
of all trees in L. Then, since E is a finite set, L' is also finite. Finally, since L is 
infinite, there are a tree £ g V , different positions v and w of £, and different labels 
a,b g U such that a g t(((v)) and b g t(((w)). Then there is a tree £ g r(C) such that 
£(v) = a and £(w) = b, which contradicts to condition (c) for uniform tree languages. ■ 

By the above lemma, for an infinite U, the 1-bounded tree language L 2 is not s- 
recognizable. 
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3.3 Comparison with variable tree automata 



In [GKS10] another automaton model with infinite input alphabet was introduced. It 
is called variable (string) automaton. In [MRU] this concept has been extended to 
variable tree automata over infinite alphabets (vta) . The theory of vta is different from 
that of sta, e.g., the class of s-recognizable tree languages is closed under complement 
(cf. [VBlla, Prop. 3]) which does not hold for the class of v-recognizable tree languages 
(cf. [MRU, Cor. 2], and [GKS10, Thm. 2]). Moreover, every sta is determinizable 
(cf. [VBlla, Thm. 1]), whereas not every variable (string) automata over infinite 
alphabets is determinizable (cf. [GKS10, Sec. 4.1]). 

In this section we will compare the recognition power of sta and of vta. In order to 
be able to do so, (1) we modify our sta model a bit and then (2) we recall the concepts 
of vta from [MRU] in a slightly adapted form. 

By a ranked set we mean a nonempty set U of symbols such that with each symbol 
a e U an element in N, the rank of a, is associated. For every I > 0, we denote by Ui 
the set of all symbols of U with rank I. 

The set of trees over a ranked set U is defined in the obvious way. 

An sfc-ta A — (Q, U, <&, F, R) is a ranked sk-ta (rsk-ta) if 

• U is a ranked set, and 

• $ is a finite set of predicates (we do not require that predicates in <!> are recursive 
and that the emptiness problem in $ is decidable). 

The concepts of an rsfc-recognizable tree language and an rs-recognizable tree language 
are defined in the obvious way. 

Now we prepare the definition of a variable tree automaton. Let U and V be 
ranked sets. A rank preserving relabeling (r- relabeling) from U to V is a mapping 
t : U — V Viy) such that t(Ui) C V/ (I > 0). Then r extends to trees in the same 
way as in case of k-tree relabelings (cf. Section 2.3). We note that r(a) need not be 
recursive and r(a) = need not be decidable for a <EU . 

Let £ be a ranked alphabet, V an infinite ranked set, A, Z, and Y ranked alphabets. 
We say that the collection (A, Z, Y) is a valid partitioning of £ for V if 

• A = £ n V, and S ; = A t U Z t U Y x for every I > 0, 

• A, Z, and Y are pairwise disjoint, and 

• \Yi\ < 1 for every < I < maxrk(S). 

The elements of Z and Y are called bounded variable symbols and free variable symbols. 

Let (A, Z,Y) be a valid partitioning of T, for V and r : E — > V(V) an r-rclabcling. 
We say that r is (A, Z, Y) -valid if 

(i) r is the identity on A, 

(ii) \t(z)\ — 1 for every z G Z, 

(iii) r is injective on Z and A\ n r(Z{) — for every I > 0, and 
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Figure 1: An (A, Z, F)-valid r-rclabeling r : E -> 7>(V). 



(iv) r(y) = V; \ (4, U r(Zj)) for every / > and y e Y h 

We denote the set of all (A, Z,Y)-valid r-relabelings by VR(A, Z,Y). In Fig. 1 we 
illustrate the conditions for a valid r-relabeling. 

A variable tree automaton (vta) is a tuple B = (A, V, A, Z, Y) where 

• A = (Q, E, 5, F) is a finite-state tree automaton, 

• V is an infinite ranked set, 

• (A, Z, Y) is a valid partitioning of S for V. 
The tree language recognized by B is the set 

L(B) = \J(t(L(A)) I r e VR(A Z, Y)) . 

We call a tree language v-recognizable if it can be recognized by a vta. 

Proposition 3.7 For every ranked alphabet E, every recognizable tree language L over 
S is afeo v-recognizable. 

Proof Let *4 be a finite-state tree automaton (with input ranked alphabet S) such 
that L = L(A). Moreover, let V be an arbitrary infinite ranked set such that SCV. 
We observe that (£,0,0) is a valid partitioning of E for V, hence B = (A, V,E,0,0) 
is a vta over V. Moreover, the only (E, 0, 0)-valid r-rclabeling is the identity mapping 
over E. Hence we obtain that L(B) = L{A). ■ 

Next we relate v-recognizable tree languages and s-recognizable tree languages. 
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Proposition 3.8 Let B = (A, V, A, Z, Y) be a vta and A have input ranked alphabet 
E. Then there is a family (L T | r € VK(A, Z,Y)) of rsk -recognizable tree languages 
over V such that 

L(B)=\J(L T \tGYR(A,Z,Y)), 

where k = maxrk(E). 

Proof We note that every r g VK(A, Z, Y) is a fc-tree relabeling. Hence, by an easy 
adaptation of Lemma 3.4(2), we have that the tree language t(L(A)) is recognizable 
by an rsfc-ta A T . Hence the statement holds with L T = L(A T ). m 

In spite of the above fact, we can prove the following statement. 

Theorem 3.9 The class of v-recognizable tree languages and the class of s-recognizable 
tree languages are incomparable with respect to inclusion. 

Proof a) We give a v-recognizablc tree language and show that it is not rs- 
recognizable. For this, let V = Vq U V\ be an infinite ranked set such that Vo — {c}, 
and consider the ranked alphabet E = So U Si with So = {c} and Si = {z} with 
z ^ V . Let A be a finite-state tree automaton with input ranked alphabet S such that 
L{A) = {zzc} (where parentheses are omitted). Now ({c}, {z}, 0) is a valid partitioning 
of S for V, hence B = (A, V, {c}, {z}, 0) is a vta over V. Since every ({c}, {z}, 0)-valid 
r-relabeling takes z to an element a e Vi, we have L(B) = {aac \ a £ V\}. By an easy 
adaptation of Lemma 3.6 we obtain that L(B) is not rs-rccognizable. 

b) We give an rsl-recognizable tree language and show that it is not v-recognizable. 

For this, let S = S U S x be a ranked alphabet with S = {a} and Si = {e, o}, and 
let V = Vo U V\ be an infinite ranked set with Vo = {a} and V\ = N . Consider the 
recognizable tree language 

L = {a, oa, eoa, oeoa, eoeoa, . . .} 

over S (where parentheses arc omitted) , and the r- relabeling t defined by 

r(a) = a, T(e) = set of all even numbers, and t(o) = set of all odd numbers. 

By the adaptation of Lemma 3.4(2) to rs/c-ta, we obtain that the tree language f(L) 
can be recognized by an rsl-ta. Roughly speaking, t(L) consists of all sequences of the 
form . . . n 2 nia, where k > 0, is an odd number if i is odd and an even number 
otherwise. We show by contradiction that t{L) cannot be recognized by any vta. 

For this, assume that there is a vta B = (A, V, A, Z, Y), where the input alphabet 
of^lisS = AUZUF such that L(B) = t(L). We may assume without loss of 
generality that Y = 0, which can be seen as follows. Assume that Y = {y}, and that 
there is a tree £ e L{A) such that y occurs in £ at the position V (see Section 2.2 
for the definition of a position). Moreover, let r : S — > ^(V) be an (A, Z, Y)-valid 
r-relabeling. Since r(y) contains both even and odd numbers, there are trees ( and (' 
in the set r(£) such that at the position V of ( and (' there is an odd number and an 
even number, respectively. On the other hand, r(£) C L(B), which is a contradiction 
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because at the position V of every tree in L(B) there is cither an odd number or an 
even number (depending on whether i is odd or even). 

Hence Y = 0. Now assume that \A U Z\ = m. Then every tree in £ € L(A) consists 
of at most m different symbols. Moreover, by the definition of the (A, Z, F)-valid r- 
relabeling, for every r € VR(A, Z, Y), each tree in r(£) consists of m different symbols. 
It means, each tree in L(B) consists of m different symbols, which contradicts to the 
much more flexible form of trees in t(L). ■ 

4 Symbolic regular tree grammars 

In this section we introduce symbolic regular tree grammars and show that they are 
semantically equivalent to sta. 

Definition 4.1 A symbolic k-bounded regular tree grammar (sfc-rtg) is a tuple Q = 
(Q, U, <f>, q , R), where 

• Q is a finite set (states 2 ), 

• (U, $) is a label structure, 

• Qo € Q (initial state), and 

• R is a finite set of rules of the form q — > u where q e Q and u e 7bc(*)( < 3)' 

By a symbolic regular tree grammar (srtg) we mean an sfc-rtg for some k e N. 

The sfc-rtg ^ = (Q,U,<&,qo,R) induces the derivation relation =^gC Tly\Q) x 
Tq"\Q) defined by £i =>g ^2 iff there is a position w € pos ? (^i) and a rule q — > u in 
-R, such that ^2 = where u' is obtained from u by replacing every occurrence 

of if € BC(<I>) by some a£ [y>]. (The condition that all predicates in $ are recursive 
makes the relation =>g recursive.) 

The fc-bounded tree language L(Q, q) generated by Q from a state q G Q is the set 

The tree language generated by Q, denoted by L(Q), is the set L(Q, q ). A tree language 
L C T^- 1 is called symbolically k-regular (for short: sfc-regular) if there is an sfc-rtg Q 
such that L — L(Q). Moreover, a tree language is s- regular if it is sfc- regular for some 
k € N. 

Two sfc-rtg Gi and G2 are equivalent if L(Gi) = L(G 2 ). 

In the following we give a characterization of s-regular tree languages in terms of 
regular tree languages and relabelings. 



2 In classical regular tree grammars these elements are called nonterminals. 
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Lemma 4.2 

1. For every sfc-regular tree language L we can effectively construct a fc-boundcd 
regular tree language L' and a fc-tree relabeling t such that L = t(L'). 

2. For every fc-bounded regular tree language L' and fc-tree relabeling r we can 
effectively construct an sfc-regular tree language L such that L = t(L'). 

Proof First let L = L(Q) for some sfc-rtg Q = (Q,U,$,q , R). We construct the 
regular tree grammar Q' = (Q, E, q , R') as follows. 

• For every I < k, let 

Sj = {[ip, 1} | 3(q -> u) e i?, u> € pos bc($) (m) : = 95 and rk„,(u) = /}, 

• and let R' be the set of all rules q — > u' such that there is a rule g — > u in i? and 
it' is obtained from u as follows: for every w e pos(w)bc(*)> we replace by 
[u(w),rk w (u)}. 

It is obvious that L{Q') is fc-bounded. Moreover, we let the relabeling r : S — » 7 7 (?7) 
be defined by r([y, /]) = [<^J for every < I < k and [<p, I] e E/. 

We can prove the following statement by tree induction: 
for every £ € T{j and q £ Q: 

q^* g C iff there is a £ € T s : q £ and C € r(£). 

Then L(0) = r(L(£'))- 

For the proof of Statement 2, let us consider a regular tree grammar Q' = 
(Q,T,,q ,R) such that L{Q') is fc-bounded and a relabeling r : S — > V(U). We 
may assume without loss of generality that maxrk(S) < fc. We construct the sfc-rtg 

= (Q,U,$>,qo,R'), where $ and R' are defined as follows: 

• $ = {(^g. I cr G S}, where [<p CT ] = t(ct) for every a S S, 

• i?': if g — > u is in _R, then g — > u' is in i?' where u 1 is obtained from u by replacing 
every a by ip a . 

It should be that L{Q) = t(L{Q')). u 

It follows from Lemma 4.2(2) that each regular tree language is also s-regular. We 
obtain this by letting r be the identity mapping. As another consequence of Lemma 
4.2, we obtain the following characterization result. 

Theorem 4.3 A tree language L is sk-regular if and only if it is the image of a k- 
bounded regular tree language under a k-tree relabeling. 

We can also show that s-recognizable tree languages are the same as s-regular tree 
languages. 

Theorem 4.4 A tree language is s-recognizable if and only if it is s-regular. 
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Proof It follows directly from Lemmas 3.4 and 4.2 and the fact that a tree language 
is recognizable if and only if it can be generated by a regular tree grammar (cf. e.g. 
Theorem 3.6 in Chapter II of [GS84]). ■ 

In the rest of this section we show some useful transformations on sfc-rtg which 
preserve the generated tree language. For this, we need some preparation. 

Let Q = (Q,U,$ 7 q , R) be an sfc-rtg. A rule q — >• u in R is feasible if, for every 
predicate ip which occurs in u, we have [</?] 7^ 0, and we call Q clean if all its rules are 
feasible. It is obvious that rules which are non-feasible cannot be used in any valuable 
derivations. Hence, they can be dropped from R without any effect on the generated 
tree language L(Q). Moreover, it is decidable whether a rule is feasible or not due to 
the fact that the emptiness of predicates in $ is decidable. Summarizing up, for every 
sfc-rtg we can construct an equivalent one, which is clean. 

The sfc-rtg Q is in normal form if every rule has the form q — »■ <p(qi, ■ ■ ■ ,qi) for some 
I < k, (f € BC($), and q x ,...,qi e Q. A state q <G Q is reachable if there is a tree 
£ € T$j\Q) such that q =>* g £ and q occurs in £. Moreover, the state q is productive, 
if L(Q,q) 7^ 0. Finally, Q is reduced if all its states are reachable and productive. We 
can prove the following result. 

Lemma 4.5 For every sfc-rtg there is an equivalent reduced sfc-rtg which is in normal 
form. 

Proof Let Q — (Q,U,$,qo,R) be an sfc-rtg. We may assume that Q is clean. By 
Lemma 4.2(1), there is a regular tree grammar Q' over some ranked alphabet £ such 
that L{Q) is fc-bounded, and there is a relabeling r : S — > ^(U) such that L(Q) = 
t(L(Q')). Since Q is clean, t(ct) 7^ for every a <G S (see the proof of that lemma). 

Then we transform Q' into an equivalent regular tree grammar Q" which is reduced 
and is in normal form using the transformations in [CDG + 97, Prop. 2.1.3, 2.1.4]. Note 
that L{Q") is fc-bounded and L{Q) = t(L(Q")). 

Finally, we follow the proof of Lemma 4.2(2) to construct an sfc-rtg Q from Q" and 
r such that L(Q) — t(L(Q")). Then Q is clean due to the above condition on r. 
Moreover, a direct inspection of that construction shows that Q is reduced and is in 
normal form. B 

5 Symbolic tree transducers 

In this section we formalize our adaptation of the concept of a symbolic tree trans- 
ducer from [VBlla, VBllb]. Then we show some basic properties, relate symbolic 
tree transducers to classical top-down tree transducers [Tha70, Rou70, Eng75], and 
compare our model with the original one. Finally, we prove a composition result for 
symbolic tree transducers. 
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5.1 Definition of stt 



For every finite set Q and I € N, wc let Q(Xi) = {q(xt) | q € Q,Xi <E X;}. 

We denote by .F(£7 — > V) the set of all unary computable functions from U to V. 
Moreover, for every tree u e Xy^^y) (Y) and a e U, we denote by u(a) the tree which 
is obtained by replacing every function / in u by the value f(a) e V". Hence we have 
that u(a) e TV(F). 

Definition 5.1 Let fc e N. A symbolic k-bounded tree transducer (sfc-tt) is a tuple 
M = (Q, U, $, V, q , R), where 

• Q is a finite set (states), 

• (J7, $) is a label structure (input label structure) and V is a set (output labels), 

• <Zo € Q (initial state), and 

• i? is a finite set of rules of the form q(ip(xi, . . . , x{)) — > u where g e Q, </? e BC($), 
< / < fc, and u G T^ ( ) l/ ^ y) (Q(X ; )). 

Clearly, every s/c-tt is an s(fc + l)-tt. By an stt we mean an sfc-tt for some k e N. 

For a rule p = g(^(a;i, . . . , xi)) — > u, we call the pair (<?, Z) the left-hand side state- 
rank pair, ip the guard, and u the right-hand side of /?, and denote them by lhs(p), 
grd(p), and rhs(p), respectively. 

We say that the stt M is linear (resp. nondeleting) if, for each rule p as above, 
its right-hand side contains at most (resp. at least) one occurrence of Xi for every 
1 < i < I. 

Moreover, M. is deterministic if, for any two different rules p\ and P2 in R, the 
condition lhs(/5i) = \hs(p2) entails that [grd(pi)] n |grd(p2)] = 0- Finally, M. is total 
if for every q e Q and < I < k, we have 

I V s rd ^)J = u - 

p£R 

iMp)=(?,i) 

We note that, as for sta, no sfc-tt is a total s(fc + l)-tt. 

Next we define the semantics of an sfc-tt A4 = (Q, U, V, qo, R)- We define the 
derivation relation of M., denoted by =>m, to be the smallest binary relation ^m^= 
Tv(Q(Tu)) x T v (Q(Tu)) such that for every € T v (Q(Tu)): 

£i =>M £.2 iff there is a position w e pos(£i) and a rule q{ip{x\, . . . , xi)) — > u in i?, 
such that 

• £iU = g(a(Ci, . . . , CO) for some a G M and Ci, ■ • • > e 7 l/ c \ and 

• £2 = £i[ u ']u>, where u' is obtained from u(a) by replacing every index p(xi) € 
Q{Xi) by /,:.,:, 

The conditions that all predicates in <!> are recursive and that all functions in the right- 
hand side of the rules are computable make the relation recursive. Sometimes, 
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we drop M from =>m- 

(k) 

Let q G Q, £ G T;/, and £ G Ty. We can show by induction on £ that if £ G T^y 
and C holds, then also C G T v . The g-tree transformation computed by A4, 

denoted by A4 g , is the relation 

The tree transformation computed by A4, also denoted by .M, is defined by = 

.M 9o . The class of tree transformations computed by sfc-tt (resp. linear, nondclcting, 

deterministic, and total, sfc-tt) is denoted by STT (fe) (resp. l-STT (fc) , n-STT (fc) , 
d _ STT (fc) 

and t- STT (fe) ). These restrictions can be combined in the usual way, for 
instance, we will denote by ln-STT^ the class of tree transformations computed by 
linear and nondeleting sfc-tt. 

A deterministic sfc-tt (total sfc-tt) transforms every input tree into at most one (at 
least one) output tree. 

Lemma 5.2 If M is a deterministic (resp. total) sfc-tt, then we have |.Mq(£)| < 1 
(resp. \M q {£)\ > 1) for every q <E Q and £ G . 

Example 5.3 We consider the s2-tt M = (Q,U,$,U,q,R) with Q = {<?}, U = N, 
and $ = {div(2), div(3)} with [div(z)] is the set of all non-negative integers which are 
divisible by i. Moreover, R has the following rules: 

(Hi g([div(2)Adiv(3)](ari,i 2 )) ->• [: 6](g(an), q( Xl )) 
p 2 : q(T(xi,x 2 )) ->• [id}(q(x 1 ),q(x 2 )) 

Ps ■ q(T) ->• [id] 

where the unary functions [: 6] and id perform division by 6 and the identity respec- 
tively. Note that A4 is not deterministic, because lhs(pi) = lhs(p2) = (g, 2) and 

[grd(pi)] n [grd(Pi)] = I div ( 2 ) A div(3)] n [T] = [div(2) A div(3)] ^ . 
Also note that M. is not total, because for I = 1 we have: 

I V/ s rd ('°)l = IV s rd ('°)l = [-L] = M c ■ 

pe-R pe0 

lh 8 (p)=(g,l) 

Also is neither linear nor nondclcting, because of rule pi. 

On the input tree £ = 6(12(4, 6), 7) the s2-tt M. can perform the following derivation: 

9(6(12(4,6), 7)) 
=► l(g(12(4,6)),g(12(4,6))) 
^ l(2(g(4), 9 (4)),12(«(4),g(6))) 
^ 4 1(2(4,4), 12(4,6)) 
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The s2-tt A4 transforms a binary tree £ in the following way. At each position w, A4 
can reproduce the label £(u>) of this position and recursively transforms the subtrees 
(using rules P2 and p^). If £(w) is divisible by 6, then, additionally (using rule (pi)), 
M can divide it by 6, delete the second subtree, and process two copies of the first 
subtree independently. 

Next we show that stt generalize (classical) top-down tree transducers. For every 
b e V, we denote by q, the constant function in fF(U — > V) defined by Q,(a) = b for 
every a GU. An sfc-tt M = (Q, U, V, go, R) is alphabetic if 

• U and V are ranked alphabets such that maxrk(C7), maxrk(F) < k, 

• $ = {<p a \ cr e U} where [<p CT ] = {cr}, 

• each rule in i? has the form q{^ a {x\ 1 . . . , xi))—tu, where 

- I = rk[/(<r), and 

- for every w € (pos(M)\posg ( - Xi - ) (u)) we have it(w) = Cb and rk u (w) — rky(6) 
for some 6 e V. 

We call predicates of the form ip a alphabetic. 

Let M. = {Q, S, $, A, go, R) be an alphabetic sfc-tt with rank mappings rks and rkA- 
Let Af — (Q, S, A, qo-R') be a top-down tree transducer with the same rank mappings. 
Then we say that M and Af are related if 

q{Lp' J {xi, . . .,xi)) -> we i? iff g(cr(o;i, . . . ,arj)) u' e R' , 

where we obtain it' from u by replacing c$ by <5 for every 5 e A. 

For every alphabetic sfc-tt we can construct a related top-down tree transducer 
Af and vice versa. Moreover, it is easy to see that if M and N are related, then the 
tree transformations computed by M any by Af are the same. Hence we obtain the 
following result. 

Observation 5.4 The class of tree transformations computed by alphabetic stt is the 
same as the class of top-down tree transformations. 

Recall that ijj is the identity mapping on U. Let A = (Q, U, F, R) be an sfc-ta. 
We introduce the sfc-tt A= = (Q, U, U, F,R=), where 

R= = {q(<p(xi, . . . ,x z ))-)'i f /(qi(xi), . . .,qi{xi)) \ {qi . ..qi,ip,q) e R}. 

We will need the following fact. 

Lemma 5.5 For every sfc-recognizable tree language L, there is a linear and nondelet- 
ing sfc-tt Af such that N — ll- 

Proof Let L = L(A) for some sfc-ta A. Then Af = A = is appropriate. ■ 

Finally, we want to compare our model with the original one from [VBllb]. Each 
rule of their symbolic tree transducer has either of the following two forms: 
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(a) q(e) — > e or 

(b) q(f(x, yi , . . . , j/ fe )) ^4 u[a;,gi(yi), . . .,g fe (j/ fe )] 

where e is the only miliary constructor for trees (more precisely, for the empty tree) 
and / is the only non-nullary constructor for trees. Since in our approach we have 
neither the empty tree nor the constructor e, there are no rules in our definition of 
symbolic tree transducers which correspond to rules of type (a). Also the constructor 
e does not occur in the right-hand side of rules of type (b) . Then, in our approach, a 
rule of type (b) looks as follows: 

q(<p{yi, ■ ■ -,yk)) -> 

where the transformation tp is defined inductively on its argument as follows: 

• i)(f(p,u 1 ,...,ui)) = (Ax.p)(^(ui),...,V(w;))> and 

• i/>{qi(yi)) = Qtivi)- 

That is, tp applies the constructor /, replaces an expression p (in which the variable 
x occurs) by the unary function Xx.p, and recursively calls itself on the subterms 

Ml, ...,Ul. 

5.2 Composition results concerning stt 

In [VBllb], among others, composition properties of symbolic tree transformations 
are considered. Their main result is Theorem 1 which, in its first statement, says that 
tree transformations computed by stt are closed under composition. For this they give 
the following proof: "The first statement can be shown along the lines of the proof of 
compositionality of TOP [15, Theorem 3.39]." where "[15]" is [FV98] in the current 
paper. Unfortunately, the mentioned proof of [FV98] is not applicable, because there 
the authors only consider total and deterministic top-down tree transducers. 

Moreover, also on the semantics level there is a deficiency. In Section 4.1 they claim 
the following: 

(f) For two arbitrary stt M and Af, the composition algorithm delivers an 
stt which computes the composition M.oM. 

However, this is not true, which can be seen as follows. Let us apply their composition 
algorithm to two alphabetic sfc-tt M and A/", then the resulting sfc-tt is also alphabetic 
(by Observation 5.7) and, due to their claim, it computes M oJ\f. Since alphabetic sfc- 
tt correspond to top-down tree transducers it means that the class of all top-down tree 
transformations is closed under composition. However, it is not, due to the counter 
examples given in [Rou70, p 267.] (cf. also [Tha70, Eng75]). 

So, the proof of the first statement is insufficient. We even conjecture that this 
statement is wrong, i.e., STT^ is not closed under composition. 

In this section we prove a weaker version of claim (f ) which only holds for particular 
stt M. and N ', cf. Theorem 5.8. In fact, we generalize the composition theorem [Bak79, 
Thm. 1] for top-down tree transducers to symbolic tree transducers. 
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For this, we use the composition algorithm of [VBllb] which results in the syntactic 
composition M.;M, and we show that in certain cases the stt M.;N computes the 
relation M.oJ\[. In the following, we recall the composition algorithm of [VBllb] in 
our formal setting. We note that this composition algorithm generalizes the (syntactic) 
composition of top-down tree transducers as presented in the definition before Theorem 
1 of [Bak79]. 

Let / e F(U — > V) and v e Tjr^ v ^ W )(Y). We denote by / o v the tree obtained 
from v by replacing every occurrence of a function g e F{V — )■ W) by the function 
/ og e F{U -> W). Of course fov€ T HU ^ W) (Y). 

In the following it will be useful to show the occurrences of objects of the form 
q(xi) in the right-hand side of rules of an stt explicitly. Therefore sometimes we write 
an arbitrary element of T^) u _^ v JQ(Xi)) in the form u[qi(x il ), . . . ,q m (x im )], where 

m > 0, u e C^u^y^Xm), qi,..., q m € Q, and 1 < i u . . . , i m < I. 

We define the syntactic composition M.;M of two sfc-tt M and M by applying M 
to the right-hand side of rules of M. However, we can do it only symbolically because 
such a right-hand side is built up from functions and not from labels. In fact, we define 
a symbolic version of the derivation relation =^a/% denoted by =>jv which processes trees 
over functions. Besides, the rewrite relation also deals with objects of the form 
q(xi) in its input trees. Moreover, we have to collect the Boolean combinations which 
are encountered during the transformation of a right-hand side. 

Formally, let M = (Q,U,® 1 ,V,q ,R 1 ) and N = (P, V, $ 2 , W,p , R 2 ) be two sfc-tt 
and $ = $iU{/o(/)|/g T(U — > V),tp G $ 2 }. First, we define the binary relation 
=>jV over the set 

BC (*) x T s (p(T A (Q (X0) ) U (P x Q) (X,) ) 
where S = F(U -> W) and A = F(U -> V) (cf. Fig. 2). For every 

(0, t), (0', t') e BC ($) x T E (p(T A (Q (X,)) ) U (P x Q) (X,) ) 

we have 

(0, i) =>jv (0',t r ) iff one of the following two conditions hold: 

(i) there is a position w £ pos(i) such that 

• t\ w = p(q(xi)) for some p e P, q e Q, and Xi e X;, 

• i' = t[(p,g)(xi)]„,, and 

• 0' = 0, or 

(ii) there is a position w € pos(i) and a rule p(tp(xi, . . . , x;)) — >• w in P 2 such that 

• *L = P(/(*i>- ••)*«)) for somc P e P, / G J"(f/ V), and t x ,...,ti e 

^F(t/->V)(Q(-^i))) 

• t' = t[v'} w where t/ is obtained from /oo by replacing every p(xj) € Q(X) by 

and 
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Figure 2: The derivation relation =>j\/-. 



• 8' = 9 A f o ip. 

The following statement follows from the definition of the relation =^j^. 

Lemma 5.6 (Lift lemma.) If (ip,t) (9, t'), then, for every a € [0] we have 

a e fip] and t(a) =^/v t'(a), where ^>^f> is the extension of to the set 

T w (P(T V (Q (X t )) )u(PxQ) (X t ) ) , 

which we obtain by adding the rules p{q{xi)) —> (p,q)(xi) to R 2 for every 1 < i < k 
(cf. p. 195 of [Bak79]). 

Second, we construct the sfc-tt M;N = (P x Q, U, <fr, W, (po, qo), R), called the 
syntactic composition of M. and Af, where the set R of rules is defined as follows. If 

q{ip(x-L,...,xi)) -> u [ft (x h ),..., q m (xi m )] (1) 

is a rule in Ri, and for some p e P and u e (7^^^ (X„) we have 

(<P,p(u[qi(x*i), ■ ■ -,Qm(Xi m )])) (=£/V/T (^"[(Pl.^iX^), •■ (Pn, &•„>(£».,„)]) (2) 
and [0] ^ 

then let the rule 

(p,q)(9(x 1 , . . •,a;j))->-u[(Pi,Q , j 1 )(a;i jl ),-- • > (p™,<7j„)(a^J] (3) 
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be in R. Note that 



{*jl> ■ •• »«jn} ^ {*!)•• C {1,.. . ,/}. 

We also note that syntactic composition preserves the properties linear, nondeleting, 
total, and deterministic. For instance, if both M. and N are linear, then A4; M is also 
linear. 

Observation 5.7 The syntactic composition of two alphabetic sfc-tt is an alphabetic 
sfc-tt. 

Proof Let us assume that M. and M are alphabetic. Then <p in (2) is an alphabetic 
predicate. Moreover, we observe that if in (i) of the definition of =>/j, the predicate 
is alphabetic, then also 8' is alphabetic; moreover, if in (ii) of this definition tp and ip 
are alphabetic and / is a constant function, then either [0'] — or 0' = 8. Therefore, 
6 in (3) is alphabetic. Moreover, by direct inspection of (ii) of the definition of =>^f 
we can see that v in (2) consists of constant functions over W. m 

Now we are able to prove our main composition result, which is in fact the general- 
ization of [Bak79, Thm. 1]. 

Theorem 5.8 Let M and Af be sk-tt for which the following two conditions hold: 

(a) M. is deterministic or N is linear, and 

(b) M is total or N is nondeleting. 
Then the sk-tt M;N induces M o M . 

PROOF We prove that, for every £ e T$j°\ p e P, q G Q, and C € T$ , we have 

M{Z)=>M#C (4) 

if and only if 

there exists an rj € Ty^ such that q(£) rj and p{rj) =^ C- (5) 

The proof can be performed by induction on £. The proof of the implication (5) => 
(4) is straightforward, hence we leave it. We note that (as in [Bak79, Thm.l]) we need 
neither condition (a) nor (b) for the proof of this direction. 

To prove that (4) => (5), let us assume that (4) holds and that £ = a(£i, ...,£;) for 
some a e U, < I < k, and £i, . . . , & € . 

Let us assume that we applied the rule (3) in the first step of (4). Then (4) can be 
written as 

(p,q)(a(Ci, •••,&)) =>M-M v(a)[( Pll q n )(£ ioi ), (p n ,qj n ){U jn )] 
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for some Ci, • • • , Cn € , where a e [0] and ( = u(a)[Ci, ■ • ■ , Cn]- Hence, 

(Pl.feH^J =Km;JV Cl (Pn,fe)(&.,J =>*M;M Cn, 

(k) 

and thus, by the induction hypothesis, there are r)i,...,r] n € Ty such that 

fe ) =^A4 »7i and Pi(»7i) =K/v" Ci, • • ■ ) fe ) % and p n (t] n ) ( n . (6) 

Since rule (3) is in R, there is a rule of then form (1) in R\ such that the derivation 
(2) holds. Hence, by Lemma 5.6, a <E and 

p(u(a) [qi (x h ),..., g TO (z iro )]) (=*V) * [(Pi , ) Oi^ ),•••, (Pn, ) )] • (7) 

Now we define the tree 77. For this, let 1 < A < m. If A = j a for some 1 < a < n, then 
we define f] x — r\ a . This f] x is well-defined, which can be seen as follows. Assume that 
ja = jp for some 1 < (3 7^ a < n. Then A/" is not linear, and thus by condition (a) M 
is deterministic, which implies r\ a = rjp. Note that by (6) 

9a(&a) = fe(£i,J Va=V\- 

If there is no a with A = j a , then A/" is deleting and thus by condition (b) Al is total. 
Hence, there is a tree fj x € Ty^ such that q\(£i x ) =>m V\- 

Let r\ = u(a)[77 l7 . . . ,f] m ]. Since the rule (1) is in Ri and a € [95], we have 

«(a(£i, ■••,&)) «(tt)[9lte 1 )r--,9mK.J] =^ u ( a )^i>---,^m]- 

Moreover by an obvious modification of (7) and by (6) 

POO)[?7i, ■ • • ,rj m ]) ^>Xf v(a)[ Pl (ri h ), . . . ,Pi(v jn )} = 
v (a) [pi(r?i ),..., Pi(Vn)] ^at v(a)[Ci,...,Cn}- ■ 

Due to Observation 5.7 this theorem generalizes [Bak79, Thm.l]. 

As an application of the above theorem, we can show that both the class of tree 
transformations computed by total and deterministic sfc-tt and the one computed by 
linear and nondeleting sfc-tt are closed under composition. 

Corollary 5.9 (a) td- STT (fc) o td- STT (fe) = td-STT (fc) 
(b) ln-STT (fe) oln-STT (fe) = ln-STT (fe) . 

Proof We prove only (a) because the proof of (b) is similar. The inclusion from left 
to right can be seen as follows. Let M. and M be total and deterministic sfc-tt. The 
sfc-tt A4; Af is also total and deterministic and, by Theorem 5.8, for the computed tree 
transformations M ; A" = M o N holds. The other inclusion follows from the facts 
that (i) any tree transformation r C x Ty^ can be decomposed as r o 1 m and 
(ii) L T (k) can be computed by a total and deterministic sfc-tt. ■ 
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6 Forward and backward application of stt 



In this section we consider forward and backward application of stt to s-recognizable 
tree languages. In particular, we consider the domain and the range of tree transfor- 
mations computed by stt. Finally, we apply these results to the problem of (inverse) 
type checking. 

6.1 Application of stt 

We begin with the following result. 
Theorem 6.1 dom(STT (fe) ) = REC (fe) . 

Proof First we prove the inclusion from left to right. For this, let A4 = 
(Q,U,®,V,q ,R) be an sfc-tt. We construct the sfc-rtg Q = (V(Q),U 1 ^,{q },R / ) 
such that dom(A^) = L(Q), where the set R' of rules is defined as follows. 

For every < I < k and P C Q with P = {p\, . . . ,p m } for some m > 1, and rules 

Pl(ifl(x 1 , . . .,Xl)) ,p m {<Pm{xi, . . . ,X t )) -> U m (8) 

in R, let R' contain the rule 

P->(<p 1 A...A<p m )(Pi,...,Pi) 

where Pi = {q e Q \ q{x{) occurs in Uj for some 1 < j < to}. Thus, in particular, for 
every < I < k, the rule 

0->T(0,...,0) 

with I occurrences of in its right-hand side is in R '. Hence £ for every £ <G T^j\ 
We claim that for every P C Q and £ <G Tjj we have: 

P i iff ( for every peP there is a C e T v such that p(£) =>* M (j . (9) 

The statement is clear for P = 0, therefore we assume that P = {pi, . . . ,p m } for some 
to > 1. We prove (9) by induction on £. 

Let£ = a(£i,...,&) . Then 
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P^g a(P 1 ,...,P l ) =>* g a(&,...,&) 

iff there is a rule P — >• (ipi A ... A <p m )(Pi, . . . , P;) in i?' with a e ( f)™ =1 [<£,■]) 
and for every 1 < i < I, Pi ^>*g 

iff there are rules (8) in i? with a e (rijlil^]) an d 

for every 1 < i < I and g e Pi there is a tree <G Ty s.t. g(^) =>* M 

iff for every 1 < j <m there is a rule pj{ipj{x\, . . . , xj)) — > Uj s.t. a € [<£>j] 

and for every occurrence of q(xi) in Uj 3 a tree Q q € TV s -t- <z(£i) ^X-i Ci,g 

iff for every 1 < j <m there is a tree (j € Ty such that pj(a(£i, . . . , (j- 

Statement (9) with P = {qo} implies L(G) = dom(A^). Hence, by Theorem 4.4 we 
obtain that dom(.M) is sfc-recognizable. The other inclusion follows from Lemma 5.5.^ 

Example 6.2 We illustrate the construction of the sfc-rtg Q in the proof of Theorem 
6.1 by an example. 

Let the s2-tt M. contain the rules 

q (if(x 1 ,x 2 )) -> /(p(a;i),g(ori)) p(6 1 ) -> h x 

p(V(zi)) -> ff(p(a:i),j/(xi)) j/(0 2 ) -> 

g(V>'(zi)) -> p(0 3 ) -> h 3 

Then s2-rtg £ contains (among others) the following rules: 

{qo} ^(fe<z},0) T 

{p,q} (VA^)({p,j/,P» T(0) 

(0i A 2 A 3 ) -> T(0,0). 

Now we can prove that backward application of stt preserve recognizability of tree 
languages. 

Theorem 6.3 (STT^-^REC^) = REC (fe) . 

Proof First we prove the inclusion from left to right. For this, let A4 = 
(Q, U, V, qo, R) be an sfc-tt, and L C Ty^ an sfc-recognizable tree language. It 
is an elementary fact that A1 _1 (L) = dom(Al o By Lemma 5.5, there is a linear 
and nondeleting sfc-tt N with N — Ll- Moreover, by Theorem 5.8, the sfc-tt M;Af 
induces AioAf. Hence A1 _1 (L) = dom(M;Af), which is sfc-recognizable by Theorem 
6.1. 

The other inclusion follows from Lemma 5.5. ■ 

It is well-known from the theory of classical tree automata and tree transducers that 
the forward application of linear top-down tree transformations preserve recognizabil- 
ity of tree languages (see e.g. [Tha69] or [GS84, Ch. IV, Cor. 6.6]). In particular, 
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the range of every linear top-down tree transformation is a recognizable tree language. 
We can show easily that a linear sfc-tt does not have the analogous property. 

Lemma 6.4 There is a linear sl-tt M. such that range(.M) is not 1-rccognizable. 

Proof Let us assume that U is infinite and define the sl-tt M. = ({q}, U, {T}, U, q, R), 
where R consists of the only rule 

q{T()) -> lu(lu). 

It is clear that M. induces the 1-tree transformation {(a, a(a)) | a e U}. Thus 
rangc(A4) = {a(a) \ a e U}, which is not 1-recognizable by the remark after Lemma 
3.6. ■ 

The non-recognizability of range(.M) above is due to the fact that M is able to 
"duplicate" a node of the input tree by having two occurrences of an appropriate 
function symbol on the right-hand side of its rule. We would like to identify a restricted 
version of an stt which docs not have this capability in the hope of that such an 
stt preserves recognizability. Therefore we define simple stt as follows. An sk-tt 
M. = (Q, U, V, qo, R) is simple if rhs(p) contains exactly one function symbol for 
every rule p e R. We denote the class of tree transformations computed by simple 
and linear stt by si- STT. Then we can prove the desired result using the following 
notation. If p <E Pred(C/) and / : U — > V is a mapping, then f(ip) denotes the predicate 
defined by [/(p)] = f ([?]). 

Theorem 6.5 si- STT (fc) (REC (fc) ) = REC (fe) . 

Proof First we prove the inclusion from left to right. Let M — (Q, U, <f>, V, qo, R) be 
a simple and linear sfc-tt and L be an s-recognizable tree language such that L = L(Q) 
for some reduced sfc-rtg Q = (P,U,& ,po, Rg) which is in normal form (cf. Theorem 
4.4 and Lemma 4.5). 

We construct the s/c-rtg Q' = (Q x P, V, (q ,p ),R'), where 

• = {/(v ^ i>) I <P an( i / occur m a rule of R, and ip in a rule of Rg}, and 

• R' is the smallest set of rules satisfying that if p — > ip(p\, . . . ,pi) is in Rg and 
q{p{xi, . . .,xi)) -> f(qi{x n ), . . . ,q m {x im )) is in R, then the rule 

(q,P) f('P^^)((Ql,Ph),---,(Qm,Pi m )) (io) 

is in R! . 

We show that L{Q') — M(L). For this it suffices to prove the following statement. 
For every q £ Q, p <G P, and £ e Ty we have 

(q,p) =>*g> C 3(€ e such that g(0 ^ C- 

We prove only the direction =>■ by induction on the number n of steps of the cor- 
responding derivation and we show only the induction step n to n + 1. The other 
direction can be proved in a similar way. 
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Direction =>, step n — > n + 1: We assume that in the first step of the derivation 
we applied the rule (10) obtained from the rules p —¥ ip(pi, ■ . ■ ,pi) in Rg and 
q{ip{xi, . . .,xi)) -> /{qiixij, . . .,q m (x im )) in R. (Note that {ii, . . . ,i m } C {1,.. .,/}.) 
Then we have 

(q,P) =>g' Hill'Pii), •• •) (<lm,Pi m )) KCi, • • • >Cm) 

for some 6 e [/(<p A ^)] and Ci,...,C m e By the I.H., there arc trees & e 

L(Q,p i:i )) such that 5^ (&.,■) ^X-i Cj f° r every 1 < j < m. Moreover, there is a 

a e ([<£_] n M) such that 6 = f(a). Now define the tree £ = a(£T, ... ,6) € T^ fc) , 
where £j = if j = for some 1 < Z < m; and let £j be an arbitrary tree in L(Q,pj) 
otherwise (note that Q is reduced). Then 

P^G a(pi,...,pi) =>g a(£i,...,&) ; 
hence £ e L(Q,p). Moreover 

••■.&)) =^.M &((9l,Pn)(6i),---,(9m,P 4ro )(^ ro )) =>* M &(&,•• -.Cm)- 

The inclusion from right to left follows from Lemma 5.5 and the fact that A = is a 
simple and linear stt. ■ 

Corollary 6.6 range(sl- STT (fc) ) =REC (fc) . 

Proof Let M = (Q,U,<b,V,qo,R) be a simple and linear sfc-tt. Obviously, 

range(A4) = M(t{j^). Moreover, by Observation 3.3, is sfc-recognizable. Hence 

the statement follows from Theorem 6.5. ■ 



6.2 Type checking with stt 

Intuitively, type checking means to verify whether or not all documents in a view have 
a certain type. According to [EM03], a typical scenario of type checking is that t 
translates XML documents into HTML documents. Thus, for a set L of XML docu- 
ments t(L) is an HTML-view of the documents in L. In practice, we are interested 
in particular XML documents, which turn to be a recognizable tree language of un- 
ranked trees over some alphabet. Also, certain desired properties of the so-obtained 
HTML documents can be described in terms of recognizability of tree languages. Thus, 
the type checking problem of r in fact means to check whether t(L) C V for rec- 
ognizable tree languages L and L' . The inverse type checking problem can be de- 
scribed in a similar way. The type checking and the inverse type checking problem 
for different kinds of transducers was considered in several works, see among others 
[MSV03, AMN+03, EM03, MBPS05]. For stt we obtain the following results. 

Theorem 6.7 

(a) The inverse type checking problem for stt is decidable. 
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(b) The type checking problem for simple and linear stt is decidable. 

Proof Both statements follow from the fact that the inclusion problem of s- 
recognizable tree languages is decidable. This latter fact can be seen as follows. By 
[VBlla, Thm. 3], s-recognizable tree languages are effectively closed under Boolean 
operations, for closure under complement, see our correction at the end of Section 3.1. 
Moreover, by [VBlla, Thm. 4], the emptiness problem is decidable for s-recognizable 
tree languages provided that the emptiness problem in the underlying label structure 
is decidable. Since, by our definition, the label structure underlying an sfc-ta has a 
decidable emptiness problem, we obtain that the inclusion problem of s-recognizable 
tree languages is decidable. 

Then the proof of (a) is as follows. Let M : ->• T [ y ] be an sfc-tt and V C 

(k) (k) 

Tjj and L C T v s-recognizable tree languages. By Theorem 6.3, the tree language 
is effectively s/c-recognizable, thus we can decide if A4 _1 (X) C L' holds or 
not. Statement (b) can be proved in a similar way, using Theorem 6.5. ■ 

7 Conclusion and an open problem 

In this paper we have further elaborated the theory of sta and stt. Our main contribu- 
tions are: the characterization of s-recognizable tree languages in terms of rclabclings 
of recognizable tree languages, the introduction of symbolic regular tree grammars 
and the proof of their equivalence to sta, the comparison of sta and variable tree au- 
tomata, the composition of stt, and the forward and backward application of stt to 
s-recognizable tree languages. 

Finally, we mention an open problem. In the definition of simple sfc-tt we required 
that the right-hand side of each rule contains exactly one function symbol. We conjec- 
ture that, for the closure result in Theorem 6.5, it is sufficient to require that right-hand 
sides of rules contain at most one function 
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